Skip to content

feat(wire): create connections in batch#865

Draft
Davidson-Souza wants to merge 4 commits intogetfloresta:masterfrom
Davidson-Souza:feature/connection-batch
Draft

feat(wire): create connections in batch#865
Davidson-Souza wants to merge 4 commits intogetfloresta:masterfrom
Davidson-Souza:feature/connection-batch

Conversation

@Davidson-Souza
Copy link
Member

Description and Notes

Sometimes when we initialize new connections, our good addresses table
might not be well populated. This causes our node to attempt the "normal"
addresses — the one we still haven't tried. As a consequence, our node will
need to attempt several addresses before finding a reachable one.

This commit creates connection attempts in batches of 10, so we try
more addresses in a shorter period of time. To make sure we don't
keep too many inflights at the time, I've decreased the connection
timeout time, so only those who reply in a timely manner will be kept.

I've also added a logic to handle_peer_ready to disconnect peers if
we already have MAX_OUTGOING_PEERS.

Finally, I've made the periodic_job that creates new connections no_log, so we stop polluting the terminal.

@Davidson-Souza Davidson-Souza self-assigned this Mar 2, 2026
@Davidson-Souza Davidson-Souza added enhancement New feature or request performance This is a performance-related issue labels Mar 2, 2026
@Davidson-Souza Davidson-Souza force-pushed the feature/connection-batch branch from 898303b to 1db21e3 Compare March 2, 2026 16:50
Copy link
Member

@JoseSK999 JoseSK999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinionated review

Comment on lines +252 to +263
let good_peers_count = self.connected_peers();
if good_peers_count > T::MAX_OUTGOING_PEERS {
debug!(
"Already have {} peers, disconnecting peer to avoid blowing up our max of {}",
self.peers.len(),
T::MAX_OUTGOING_PEERS
);

self.send_to_peer(peer, NodeRequest::Shutdown)?;

return Ok(());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I think better to take into account how many peers we could connect to before exceeding the limit BEFORE trying more connections, see next comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to control excess peers we can disconnect slow peers in IBD and random peers once synced... followup PR I have already done privately

Comment on lines 595 to 598
let connection_kind = ConnectionKind::Regular(required_service);
if self.peers.len() < T::MAX_OUTGOING_PEERS {
for _ in 0..NEW_CONNECTIONS_BATCH_SIZE {
self.create_connection(connection_kind)?;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this in a private branch, I think makes sense

        // How many peers we can connect to without exceeding the limit, if any
        let peer_capacity = T::MAX_OUTGOING_PEERS.saturating_sub(self.connected_peers());
        let connection_kind = ConnectionKind::Regular(required_service);

        // Try connecting to at most 4 peers (fewer if max capacity would be reached)
        for _ in 0..peer_capacity.min(MAX_OPEN_CONNECTIONS_BATCH) {
            self.create_connection(connection_kind)?;
        }

Copy link
Member Author

@Davidson-Souza Davidson-Souza Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this, but for the last connections we basically revert to the older logic. I had some runs where I would create 8/9 connections super quick, and then just hang for dozens of seconds.

Using this logic we can get 10 peers for chain selection pretty quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you exceed the limit on purpose then you disconnect the exceeded peers. Btw the 8/9 fast connections happens to me on master, idk why it takes longer to get the 10th peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you exceed the limit on purpose then you disconnect the exceeded peers.

Yes! It should not get bigger than 20 tho, because we also abort the connection pretty quickly if they don't respond

Btw the 8/9 fast connections happens to me on master, idk why it takes longer to get the 10th peer.

For me as well, I think it has to do with the address manager getting populated by the addresses received from new connections? It's really weird tho. At least for me it doesn't happen on this PR

Comment on lines +579 to +581
if self.peers.len() >= T::MAX_OUTGOING_PEERS {
return Ok(());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This then can be removed

Copy link
Member Author

@Davidson-Souza Davidson-Souza Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we will keep connecting and disconnecting peers, I think it's fine

Edit: outside of the feelers logic

Comment on lines +53 to +54
/// How many connections we try at once
const NEW_CONNECTIONS_BATCH_SIZE: usize = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we would sometimes try fewer connections this is more accurate. I think 4 is kinda nice but this is just how I feel

/// The maximum number of connections we try to open in parallel, in `maybe_open_connection`.
const MAX_OPEN_CONNECTIONS_BATCH: usize = 4;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be part of NodeContext. Then use 4 for all but ChainSelector?

Copy link
Member

@JoseSK999 JoseSK999 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, what if we used 12 for ChainSelector so that 1-2 failed connections don't make us wait 10 seconds (also a nice multiple of 4). We keep the first 10 peers. Not sure.


impl NodeContext for ChainSelector {
const REQUEST_TIMEOUT: u64 = 60; // Ban peers stalling our IBD
const TRY_NEW_CONNECTION: u64 = 1; // Try creating connections more aggressively
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this with 4 peers as batch size

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've increased this to make sure we GC the dead peers before attempting a new round. We remove all pending Connects before trying to add more, in order to keep the inflight connections list small

@Davidson-Souza Davidson-Souza changed the title Feature/connection batch feat(wire): create connections in batch Mar 2, 2026
Copy link
Collaborator

@jaoleal jaoleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CACK

let good_peers_count = self.connected_peers();
if good_peers_count > T::MAX_OUTGOING_PEERS {
debug!(
"Already have {} peers, disconnecting peer to avoid blowing up our max of {}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If good_peers_count is the value triggering the log, isnt more accurate for it to be:

            debug!(
                "Already have {good_peers_count} peers, disconnecting peer to avoid blowing up our max of {}",
                T::MAX_OUTGOING_PEERS
            );

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


/// Attempt to open a new connection (if needed) every TRY_NEW_CONNECTION seconds
const TRY_NEW_CONNECTION: u64 = 10; // 10 seconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we move it here:

    /// How long should we wait for a peer to respond our connection request. This shouldn't be
    /// greater than `TRY_NEW_CONNECTION` in order to clear timed-out requests at the same pace.
    const CONNECTION_TIMEOUT: u64 = 10;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Will do

@Davidson-Souza Davidson-Souza force-pushed the feature/connection-batch branch 2 times, most recently from 1571ff6 to e991a0f Compare March 3, 2026 00:21
@Davidson-Souza
Copy link
Member Author

Pushed 1571ff6:

  • Moved NEW_CONNECTIONS_BATCH_SIZE to NodeContext (didn't rename it to MAX_OPEN_CONNECTIONS_BATCH since this is not a max, but a fixed number) @JoseSK999
  • Changed NEW_CONNECTIONS_BATCH_SIZE to 12 for all but RunningNode, which is 4 @JoseSK999
  • Decreased TRY_NEW_CONNECTION and CONNECTION_TIMEOUT to 2. Now we start downloading headers almost instantly, even on a fresh run (it's surprisingly stable on my end)
  • Changed the debug! to use `good_peers_count @jaoleal
  • Added comment on CONNECTION_TIMEOUT about it's link to TRY_NEW_CONNECTION

@JoseSK999
Copy link
Member

Isn't using a NEW_CONNECTIONS_BATCH_SIZE of 12 for SyncNode like killing a fly with a nuke? We only use 10 peers there. I think makes more sense to keep 4 the default and only 12 for ChainSelector.

) -> Result<(), WireError> {
// try to connect with manually added peers
self.maybe_open_connection_with_added_peers()?;
if self.peers.len() >= T::MAX_OUTGOING_PEERS {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use connected_peers()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@Davidson-Souza Davidson-Souza force-pushed the feature/connection-batch branch from e991a0f to 96a1d80 Compare March 3, 2026 16:31
@Davidson-Souza
Copy link
Member Author

Pushed 96a1d80 with this diff

diff --git a/crates/floresta-wire/src/p2p_wire/node/chain_selector_ctx.rs b/crates/floresta-wire/src/p2p_wire/node/chain_selector_ctx.rs
index 068fe30..1ea730f 100644
--- a/crates/floresta-wire/src/p2p_wire/node/chain_selector_ctx.rs
+++ b/crates/floresta-wire/src/p2p_wire/node/chain_selector_ctx.rs
@@ -140,6 +140,10 @@ pub enum PeerCheck {
 impl NodeContext for ChainSelector {
     const REQUEST_TIMEOUT: u64 = 60; // Ban peers stalling our IBD
 
+    // Since we don't have any peers when this starts, we use a more aggressive batch
+    // size to make sure we get our MAX_OUTGOING_CONNECTIONS ASAP
+    const NEW_CONNECTIONS_BATCH_SIZE: usize = 12;
+
     fn get_required_services(&self) -> ServiceFlags {
         ServiceFlags::NETWORK | service_flags::UTREEXO.into() | service_flags::UTREEXO_FILTER.into()
     }
diff --git a/crates/floresta-wire/src/p2p_wire/node/conn.rs b/crates/floresta-wire/src/p2p_wire/node/conn.rs
index 6d7202a..7fa3e34 100644
--- a/crates/floresta-wire/src/p2p_wire/node/conn.rs
+++ b/crates/floresta-wire/src/p2p_wire/node/conn.rs
@@ -573,7 +573,7 @@ where
     ) -> Result<(), WireError> {
         // try to connect with manually added peers
         self.maybe_open_connection_with_added_peers()?;
-        if self.peers.len() >= T::MAX_OUTGOING_PEERS {
+        if self.connected_peers() >= T::MAX_OUTGOING_PEERS {
             return Ok(());
         }
 
diff --git a/crates/floresta-wire/src/p2p_wire/node/running_ctx.rs b/crates/floresta-wire/src/p2p_wire/node/running_ctx.rs
index e7c91ac..cced16f 100644
--- a/crates/floresta-wire/src/p2p_wire/node/running_ctx.rs
+++ b/crates/floresta-wire/src/p2p_wire/node/running_ctx.rs
@@ -58,10 +58,6 @@ pub struct RunningNode {
 impl NodeContext for RunningNode {
     const REQUEST_TIMEOUT: u64 = 2 * 60;
 
-    // We expect already having several peers after IBD, so we don't need to be too aggressive when
-    // creating new connections
-    const NEW_CONNECTIONS_BATCH_SIZE: usize = 4;
-
     fn get_required_services(&self) -> ServiceFlags {
         ServiceFlags::NETWORK
             | service_flags::UTREEXO.into()
diff --git a/crates/floresta-wire/src/p2p_wire/node_context.rs b/crates/floresta-wire/src/p2p_wire/node_context.rs
index ea3bfe3..224fb5e 100644
--- a/crates/floresta-wire/src/p2p_wire/node_context.rs
+++ b/crates/floresta-wire/src/p2p_wire/node_context.rs
@@ -84,7 +84,7 @@ pub trait NodeContext {
     const MAINTENANCE_TICK: Duration = Duration::from_secs(1);
 
     /// How many connections we try at once
-    const NEW_CONNECTIONS_BATCH_SIZE: usize = 12;
+    const NEW_CONNECTIONS_BATCH_SIZE: usize = 4;
 
     fn get_required_services(&self) -> ServiceFlags {
         ServiceFlags::NETWORK

Comment on lines 420 to 423
periodic_job!(
self.last_feeler => self.open_feeler_connection(),
RunningNode::FEELER_INTERVAL,
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a no log here?

Comment on lines 256 to 259
periodic_job!(
self.last_feeler => self.open_feeler_connection(),
SyncNode::FEELER_INTERVAL,
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here?

@Davidson-Souza
Copy link
Member Author

There's a problem here: if we addnode when we have all our 10 connections, we will disconnect the added node. If we had #636 it would be trivial to fix, since we would just allow added nodes to bypass our quota check

@JoseSK999
Copy link
Member

JoseSK999 commented Mar 3, 2026

Righttt, what about a small PR only adding Manual peers, should be easy to review and merge (or even in this PR if it's not that complex to add)

@Davidson-Souza
Copy link
Member Author

Righttt, what about a small PR only adding Manual peers, should be easy to review and merge (or even in this PR if it's not that complex to add)

Hold my coffee, will craft one in a jiffy

@JoseSK999
Copy link
Member

Holding your cubata

@Davidson-Souza Davidson-Souza added this to the v0.9.0 milestone Mar 3, 2026
All peers that are added by the user — those added with the RPC
`addnode` or the CLI option `--connect`, should have kind `Manual`,
rather than `Regular`. This excempts them from:
  - Max peers quota — if we already have 10 peers and the user tries
    to addnode another one, we will now have 11 peers
  - They whitelisted and shouldn't be banned by misbehaving
  - They don't need to have any required service

This commit adds this new variant, and makes sure we are following
the above rules.
Manual connections are whitelisted and should not be banned, even
if they misbehave. This commit makes sure we are not doing so.

I've removed all the times we just disconnect a misbehaving peer with
a call to `increase_banscore` with the maximum `banscore` as the
increase amount (so they will be banned right away)
Sometimes when we initialize new connections, our `good addresses` table
might not be well populated. This causes our node to attempt the "normal"
addresses — the one we still haven't tried. As a consequence, our node will
need to attempt several addresses before finding a reachable one.

This commit creates connection attempts in batches of 10, so we try
more addresses in a shorter period of time. To make sure we don't
keep too many inflights at the time, I've decreased the connection
timeout time, so only those who reply in a timely manner will be kept.

I've also added a logic to `handle_peer_ready` to disconnect peers if
we already have MAX_OUTGOING_PEERS.
When we use `periodic_jobs` with `maybe_create_connections` we might
create a lot of `NoAddressAvailable` errors, specially on
signet/testnet/regtest. This offers little information and pollutes
the terminal.

This commit uses the no_log attribute from `periodic_job` to avoid logging
errors to stdout
@Davidson-Souza Davidson-Souza force-pushed the feature/connection-batch branch from 96a1d80 to 4ce49a7 Compare March 3, 2026 21:17
@Davidson-Souza Davidson-Souza marked this pull request as draft March 3, 2026 21:17
@Davidson-Souza
Copy link
Member Author

I've incorporated #866 and it seems to fix #865 (comment). Marking this as draft until #866 gets merged

T::MAX_OUTGOING_PEERS
);

self.send_to_peer(peer, NodeRequest::Shutdown)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we have just sent a GetAddresses to this node, but then we disconnect. Idk if we will receive the addresses at all, or just disconnect.

A random idea I got: maybe we can convert these excess peers into feeler peers. We get the addresses and then disconnect.

Idk if this is a good idea tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request performance This is a performance-related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants